我们提出了Video-Transunet,这是一种深层体系结构,例如通过将时间融合到Transunet深度学习框架中构建的医学CT视频中的细分。特别是,我们的方法通过Resnet CNN主链,通过时间上下文模块(TCM)混合的多帧功能(TCM),通过视觉变压器进行非本地关注以及通过基于UNET的卷积为多个目标的重建功能混合的强框架表示强的框架表示 - 具有多个头部的卷积架构。我们表明,在视频荧光吞咽研究(VFSS)CT序列中,对推注和咽/喉的分割进行测试时,这种新的网络设计可以显着优于其他最先进的系统。在我们的VFSS2022数据集上,它达到了$ 0.8796 \%$的骰子系数,平均表面距离为$ 1.0379 $。请注意,准确跟踪咽注:在临床实践中特别重要,因为它构成了吞咽损伤诊断的主要方法。我们的发现表明,所提出的模型确实可以通过利用时间信息并通过显着的边距提高分割性能来增强Transunet架构。我们发布关键源代码,网络权重和地面真相注释,以简化性能再现。
translated by 谷歌翻译
我们在RGB-D数据中解决了人们检测的问题,在该数据中,我们利用深度信息开发了利益区域(ROI)选择方法,该方法为两种颜色和深度CNN提供建议。为了结合两个CNN产生的检测,我们根据深度图像的特征提出了一种新型的融合方法。我们还提出了一个新的深度编码方案,该方案不仅将深度图像编码为三个通道,而且还增强了分类信息。我们对公开可用的RGB-D人数据集进行了实验,并表明我们的方法优于仅使用RGB数据的基线模型。
translated by 谷歌翻译
我们为环境辅助生活(AAL)提出了一种新型的多模式传感器融合方法,该方法利用了使用特权信息(LUPI)学习的优势。我们解决了标准多模式方法的两个主要缺点,有限的面积覆盖率和降低的可靠性。我们的新框架将模幻幻觉的概念与三胞胎学习融合在一起,以训练具有不同模态的模型,以在推理时处理缺失的传感器。我们使用RGB视频和骨骼作为特权模式评估了来自可穿戴加速度计设备的惯性数据的拟议模型,并在UTD-MHAD数据集中表现出平均6.6%的准确性,平均为5.5%,伯克利MHAD MHAD DATASET的准确性为5.5%,在这些数据集上达到新的最新唯一分类精度。我们通过几项消融研究来验证我们的框架。
translated by 谷歌翻译
We propose a novel end-to-end curriculum learning approach for sparsely labelled animal datasets leveraging large volumes of unlabelled data to improve supervised species detectors. We exemplify the method in detail on the task of finding great apes in camera trap footage taken in challenging real-world jungle environments. In contrast to previous semi-supervised methods, our approach adjusts learning parameters dynamically over time and gradually improves detection quality by steering training towards virtuous self-reinforcement. To achieve this, we propose integrating pseudo-labelling with curriculum learning policies and show how learning collapse can be avoided. We discuss theoretical arguments, ablations, and significant performance improvements against various state-of-the-art systems when evaluating on the Extended PanAfrican Dataset holding approx. 1.8M frames. We also demonstrate our method can outperform supervised baselines with significant margins on sparse label versions of other animal datasets such as Bees and Snapshot Serengeti. We note that performance advantages are strongest for smaller labelled ratios common in ecological applications. Finally, we show that our approach achieves competitive benchmarks for generic object detection in MS-COCO and PASCAL-VOC indicating wider applicability of the dynamic learning concepts introduced. We publish all relevant source code, network weights, and data access details for full reproducibility. The code is available at https://github.com/youshyee/DCL-Detection.
translated by 谷歌翻译
我们提出了一个临时投票网络(TVNet),用于在未经监控的视频中进行行动定位。这包括一个新的投票证据模块来定位时间边界,更准确地,其中累积时间上下侧证据以预测开始和结束动作边界的帧级概率。我们独立于行动的证据模块纳入管道内,以计算置信度分数和行动课程。我们在ActivityNet-1.3上达到34.6%的平均地图,特别优于以前的方法0.95。TVNET在与PGCN结合和59.1%时,TVCN在0.5 IOU上的PGCN和59.1%上的距离在Thumos14上的距离和所有阈值以前的工作。我们的代码可在https://github.com/hanielwang/tvnet上获得。
translated by 谷歌翻译
尽管对视频表示学习的自我监督预先预测方法的突出成功,但在未标记的预测数据集很小或源任务(预先训练)中的未标记数据和目标任务中标记的数据(Fineetuning)之间的域差异。为了缓解这些问题,我们提出了一种新的方法来通过基于知识相似性蒸馏,Auxskd的辅助预押阶段补充自我监督预测,以便更好地推广,具有明显较少量的视频数据,例如,动力学-100而不是动力学-400。我们的方法通过捕获未标记的视频数据的段之间的相似信息,将其知识迭代地将其知识蒸发到学生模型。然后,学生模型通过利用此先验知识来解决借口任务。我们还介绍了一种新颖的借口任务,视频段速度预测或VSPP,这需要我们的模型来预测输入视频的随机选择段的播放速度,以提供更可靠的自我监督的表示。我们的实验结果表明,在K100上预先训练时,UCF101和HMDB51数据集的最先进结果卓越。此外,我们表明我们的辅助辅助辅助持久性辅助阶段作为最近的艺术的自我监督方法(例如VideOpace和Rspnet),可以在UCF101和HMDB51上提高结果。我们的代码即将发布。
translated by 谷歌翻译
高飞空中无人机捕获的视觉似乎越来越多地用于评估全球生物多样性和动物人口动态。然而,尽管超高分辨率相机,挑战采集场景和空气传播图像中的小型动物描绘,但到目前为止,这一直是利用高信心地应用计算机视觉探测器的因素。在本文中,我们首次通过将具有超级分辨率技术和高度数据组合的深度对象探测器来解决问题。特别是,我们表明,整体关注网络的超级分辨率方法和定制的海拔高度数据剥削网络进入标准识别管道,可以大大提高现实世界中的检测效率。我们评估两个公共,大型空中捕获动物数据集,Savmap和AED系统。我们发现所提出的方法可以一致地改善烧蚀的基线和两个数据集的最先进的性能。此外,我们对动物分辨率与检测性能之间的关系提供了系统分析。我们得出结论,超级分辨率和高度知识利用技术可以显着增加环境的基准,因此,在检测到空中图像中的微小解决的动物时应常规使用。
translated by 谷歌翻译
We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearly-optimal polynomial-time robust estimators for the mean and covariance of high-dimensional Gaussians which are based on the Sum-of-Squares method, we design the first polynomial-time private estimators for these problems with nearly-optimal samples-accuracy-privacy tradeoffs. Our algorithms are also robust to a constant fraction of adversarially-corrupted samples.
translated by 谷歌翻译
A major challenge in machine learning is resilience to out-of-distribution data, that is data that exists outside of the distribution of a model's training data. Training is often performed using limited, carefully curated datasets and so when a model is deployed there is often a significant distribution shift as edge cases and anomalies not included in the training data are encountered. To address this, we propose the Input Optimisation Network, an image preprocessing model that learns to optimise input data for a specific target vision model. In this work we investigate several out-of-distribution scenarios in the context of semantic segmentation for autonomous vehicles, comparing an Input Optimisation based solution to existing approaches of finetuning the target model with augmented training data and an adversarially trained preprocessing model. We demonstrate that our approach can enable performance on such data comparable to that of a finetuned model, and subsequently that a combined approach, whereby an input optimization network is optimised to target a finetuned model, delivers superior performance to either method in isolation. Finally, we propose a joint optimisation approach, in which input optimization network and target model are trained simultaneously, which we demonstrate achieves significant further performance gains, particularly in challenging edge-case scenarios. We also demonstrate that our architecture can be reduced to a relatively compact size without a significant performance impact, potentially facilitating real time embedded applications.
translated by 谷歌翻译
Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided reference questions. Second, it penalises valid questions that may not have high lexical or semantic similarity to the reference questions. In this paper, we propose a new metric, RQUGE, based on the answerability of the candidate question given the context. The metric consists of a question-answering and a span scorer module, in which we use pre-trained models from the existing literature, and therefore, our metric can be used without further training. We show that RQUGE has a higher correlation with human judgment without relying on the reference question. RQUGE is shown to be significantly more robust to several adversarial corruptions. Additionally, we illustrate that we can significantly improve the performance of QA models on out-of-domain datasets by fine-tuning on the synthetic data generated by a question generation model and re-ranked by RQUGE.
translated by 谷歌翻译